fix(esc): mid-stream cancellation for OpenAI-compatible + Minimax providers#145
Merged
Merged
Conversation
…viders PR #144 added abort-signal-aware streaming for ``AnthropicProvider``, but ``OpenAICompatibleProvider`` and ``MinimaxProvider`` got only the pre-call fast-path. Users running through LiteLLM / GLM / OpenAI / DeepSeek — the most common "OpenAI-compatible proxy → Claude" stack — still saw ESC wait the full model latency before the post-API abort check fired. Same 20+ second symptom from before #144. Port the response-close listener pattern from #144's AnthropicProvider: * Register a listener on the abort signal that calls ``stream.response.close()`` to close the underlying HTTP socket. Closes interrupt the SDK's blocking next-chunk read so the iterator raises immediately, even when the model is in a multi-second gap between chunks (extended thinking, tool_use generation). * For OpenAI-compatible providers, additionally add an in-loop ``if abort_signal.aborted: break`` check at the top of each ``for chunk in stream`` iteration. Covers the case where chunks arrive back-to-back fast enough that the listener's close lands one iteration late, or where the SDK has already prefetched chunks past the close point. * Signal-state-authoritative exception translation in the ``except Exception`` block — different SDK versions raise different exception classes when the response is closed mid-read, so the signal is the only stable abort indicator. * Register-then-recheck ordering closes the sub-microsecond race where ``_fire`` can snapshot the listener list and silently drop a freshly-appended listener. * ``finally`` block detaches the listener so long-lived controllers (the REPL engine's, reused across many turns) don't accumulate dead listeners. Minimax wraps the anthropic SDK against its compatible endpoint, so it gets the AnthropicProvider treatment (no in-loop check — the ``with client.messages.stream(...) as stream:`` pattern only exposes ``text_stream``, not a generic iterator). Ten regression tests pin the contract: * ``test_openai_compat_abort_signal.py`` (6 tests) — pre-abort fast-path with leaf-level ``assert_not_called()``, mid-stream close via response.close + timing bound, **load-bearing** in-loop check (asserts ``on_text_chunk`` saw only "first" not "second" — mutation-verified by deleting the in-loop check and watching the test fail), normal-completion regression check, ``abort_signal=None`` legacy parity, listener detachment. * ``test_minimax_abort_signal.py`` (4 tests) — same shape as AnthropicProvider, with ``_ensure_client`` as the fast-path sentinel. Three-way duplication of the close-listener pattern (Anthropic, Minimax, OpenAI-compat) is acknowledged. Extracting a shared helper is left as a follow-up — the three providers' surrounding contexts differ enough (Anthropic has the watchdog + non-streaming fallback, OpenAI-compat has bare-iterator semantics, Minimax has the ``with``-block + ``get_final_message``) that a premature extraction would either grow the helper to a 4-knob API or leak abstraction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AnthropicProvider, butOpenAICompatibleProviderandMinimaxProvidergot only the pre-call fast-path. Users on LiteLLM (OpenAI-compatible) → Claude — the user who reported the lingering 10s ESC pause is on this exact stack — still saw ESC wait the full model latency before the outer query loop's abort check fired.User-reported symptom
After #144 merged, a user testing with LiteLLM → Anthropic Claude Opus 4.7 reported ESC still took ~10s during the model's "Thinking…" phase. The call path went through
openai_compatible.py(LiteLLM exposes an OpenAI-compatible API), notanthropic_provider.py, so #144's listener never registered. This PR closes that gap.Changes
src/providers/openai_compatible.py— two defenses inchat_stream_response:abort_signalviaadd_listener(..., once=True). Callsstream.response.close()to close the underlying httpx socket. The SDK's blocking next-chunk read raises immediately. Handles the user's exact case (long gap between chunks during extended thinking / tool_use generation).if abort_signal.aborted: breakat the top of eachfor chunk in stream:iteration. Catches the SDK-prefetched-chunks case where the listener's close lands one iteration late.except Exceptionblock.finallyblock detaches the listener so long-lived controllers don't accumulate listeners.src/providers/minimax_provider.py— Minimax uses the anthropic SDK against its compatible endpoint, so it gets the AnthropicProvider treatment (response-close listener; no in-loop check needed because thewith ... as stream:only exposestext_stream).tests/test_openai_compat_abort_signal.py(new, 6 tests):client.chat.completions.create(leaf-levelassert_not_called())stream.response.close()was calledon_text_chunkcallback, assertsseen == ["first"]not["first", "second"]. Mutation-tested by deleting the in-loop check and watching the test fail with the exact expected message.abort_signal=Nonelegacy paritytests/test_minimax_abort_signal.py(new, 4 tests): same shape as the Anthropic tests, with_ensure_clientas the fast-path sentinel.Test plan
Follow-up (deferred)
Three-way duplication of the response-close-listener pattern across
AnthropicProvider,MinimaxProvider, andOpenAICompatibleProvider. The three contexts differ enough (Anthropic has the watchdog + non-streaming fallback, OpenAI-compat has bare iterator + in-loop check, Minimax haswith-block +get_final_message) that premature extraction would either grow the helper to a 4-knob API or leak abstraction. Will file as a separate refactor PR.🤖 Generated with Claude Code